AITopics | stochastic momentum method

We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of function values produced by these algorithms which is expressed only in terms of the eigenvalues of the Hessian. This leads to simple expressions for nearly-optimal hyperparameters, a description of the limiting neighborhood, and average-case complexity. As a consequence, we show that (small-batch) stochastic heavy-ball momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly. For contrast, in the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum. By introducing hyperparameters that depend on the number of samples, we propose a new algorithm sDANA (stochastic dimension adjusted Nesterov acceleration) which obtains an asymptotically optimal average-case complexity while remaining linearly convergent in the strongly convex setting without adjusting parameters.

large-scale, name change, stochastic momentum method, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Stochastic Momentum Methods for Non-smooth Non-Convex Finite-Sum Coupled Compositional Optimization

Chen, Xingyu, Wang, Bokun, Yang, Ming, Lin, Qihang, Yang, Tianbao

arXiv.org Artificial IntelligenceOct-30-2025

Finite-sum Coupled Compositional Optimization (FCCO), characterized by its coupled compositional objective structure, emerges as an important optimization paradigm for addressing a wide range of machine learning problems. In this paper, we focus on a challenging class of non-convex non-smooth FCCO, where the outer functions are non-smooth weakly convex or convex and the inner functions are smooth or weakly convex. Existing state-of-the-art result face two key limitations: (1) a high iteration complexity of $O(1/ε^6)$ under the assumption that the stochastic inner functions are Lipschitz continuous in expectation; (2) reliance on vanilla SGD-type updates, which are not suitable for deep learning applications. Our main contributions are two fold: (i) We propose stochastic momentum methods tailored for non-smooth FCCO that come with provable convergence guarantees; (ii) We establish a new state-of-the-art iteration complexity of $O(1/ε^5)$. Moreover, we apply our algorithms to multiple inequality constrained non-convex optimization problems involving smooth or weakly convex functional inequality constraints. By optimizing a smoothed hinge penalty based formulation, we achieve a new state-of-the-art complexity of $O(1/ε^5)$ for finding an (nearly) $ε$-level KKT solution. Experiments on three tasks demonstrate the effectiveness of the proposed algorithms.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2506.02504

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Neural Information Processing SystemsOct-10-2024, 07:12:12 GMT

We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of function values produced by these algorithms which is expressed only in terms of the eigenvalues of the Hessian. This leads to simple expressions for nearly-optimal hyperparameters, a description of the limiting neighborhood, and average-case complexity. As a consequence, we show that (small-batch) stochastic heavy-ball momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly. For contrast, in the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum.

large-scale, quadratic model, stochastic momentum method, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback

Last-iterate convergence analysis of stochastic momentum methods for neural networks

Xu, Dongpo, Liu, Jinlan, Lu, Yinghua, Kong, Jun, Mandic, Danilo

arXiv.org Artificial IntelligenceMay-29-2022

The stochastic momentum method is a commonly used acceleration technique for solving large-scale stochastic optimization problems in artificial neural networks. Current convergence results of stochastic momentum methods under non-convex stochastic settings mostly discuss convergence in terms of the random output and minimum output. To this end, we address the convergence of the last iterate output (called last-iterate convergence) of the stochastic momentum methods for non-convex stochastic optimization problems, in a way conformal with traditional optimization theory. We prove the last-iterate convergence of the stochastic momentum methods under a unified framework, covering both stochastic heavy ball momentum and stochastic Nesterov accelerated gradient momentum. The momentum factors can be fixed to be constant, rather than time-varying coefficients in existing analyses. Finally, the last-iterate convergence of the stochastic momentum methods is verified on the benchmark MNIST and CIFAR-10 datasets.

artificial intelligence, last-iterate convergence analysis, machine learning, (2 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.neucom.2023.01.032

2205.14811

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions

Chen, Zaiyi, Yang, Tianbao, Yi, Jinfeng, Zhou, Bowen, Chen, Enhong

arXiv.org Machine LearningSep-4-2018

Although stochastic gradient descent (SGD) method and its variants (e.g., stochastic momentum methods, AdaGrad) are the choice of algorithms for solving non-convex problems (especially deep learning), there still remain big gaps between the theory and the practice with many questions unresolved. For example, there is still a lack of theories of convergence for SGD and its variants that use stagewise step size and return an averaged solution in practice. In addition, theoretical insights of why adaptive step size of AdaGrad could improve non-adaptive step size of {\sgd} is still missing for non-convex optimization. This paper aims to address these questions and fill the gap between theory and practice. We propose a universal stagewise optimization framework for a broad family of {\bf non-smooth non-convex} (namely weakly convex) problems with the following key features: (i) at each stage any suitable stochastic convex optimization algorithms (e.g., SGD or AdaGrad) that return an averaged solution can be employed for minimizing a regularized convex problem; (ii) the step size is decreased in a stagewise manner; (iii) an averaged solution is returned as the final solution that is selected from all stagewise averaged solutions with sampling probabilities {\it increasing} as the stage number. Our theoretical results of stagewise AdaGrad exhibit its adaptive convergence, therefore shed insights on its faster convergence for problems with sparse stochastic gradients than stagewise SGD. To the best of our knowledge, these new results are the first of their kind for addressing the unresolved issues of existing theories mentioned earlier.

artificial intelligence, convergence, machine learning, (17 more...)

arXiv.org Machine Learning

1808.06296

Country:

Asia > China (0.04)
North America > United States > Iowa (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

A Unified Analysis of Stochastic Momentum Methods for Deep Learning

Yan, Yan, Yang, Tianbao, Li, Zhe, Lin, Qihang, Yang, Yi

arXiv.org Machine LearningAug-30-2018

Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This paper aims to bridge the gap between practice and theory by analyzing the stochastic gradient (SG) method, and the stochastic momentum methods including two famous variants, i.e., the stochastic heavy-ball (SHB) method and the stochastic variant of Nesterov's accelerated gradient (SNAG) method. We propose a framework that unifies the three variants. We then derive the convergence rates of the norm of gradient for the non-convex optimization problem, and analyze the generalization performance through the uniform stability approach. Particularly, the convergence analysis of the training objective exhibits that SHB and SNAG have no advantage over SG. However, the stability analysis shows that the momentum term can improve the stability of the learned model and hence improve the generalization performance. These theoretical insights verify the common wisdom and are also corroborated by our empirical analysis on deep learning.

artificial intelligence, generalization error, machine learning, (19 more...)

arXiv.org Machine Learning

1808.10396

Country:

North America > United States > Iowa (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre:

Research Report (0.82)
Instructional Material (0.54)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization

Yang, Tianbao, Lin, Qihang, Li, Zhe

arXiv.org Machine LearningMay-4-2016

Recently, {\it stochastic momentum} methods have been widely adopted in training deep neural networks. However, their convergence analysis is still underexplored at the moment, in particular for non-convex optimization. This paper fills the gap between practice and theory by developing a basic convergence analysis of two stochastic momentum methods, namely stochastic heavy-ball method and the stochastic variant of Nesterov's accelerated gradient method. We hope that the basic convergence results developed in this paper can serve the reference to the convergence of stochastic momentum methods and also serve the baselines for comparison in future development of stochastic momentum methods. The novelty of convergence analysis presented in this paper is a unified framework, revealing more insights about the similarities and differences between different stochastic momentum methods and stochastic gradient method. The unified framework exhibits a continuous change from the gradient method to Nesterov's accelerated gradient method and finally the heavy-ball method incurred by a free parameter, which can help explain a similar change observed in the testing error convergence behavior for deep learning. Furthermore, our empirical results for optimizing deep neural networks demonstrate that the stochastic variant of Nesterov's accelerated gradient method achieves a good tradeoff (between speed of convergence in training error and robustness of convergence in testing error) among the three stochastic methods.

artificial intelligence, machine learning, optimization, (15 more...)

arXiv.org Machine Learning

1604.03257

Country: North America > United States > Iowa (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Filters

Collaborating Authors

stochastic momentum method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

4cf0ed8641cfcbbf46784e620a0316fb-Paper.pdf

4cf0ed8641cfcbbf46784e620a0316fb-Paper.pdf

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Stochastic Momentum Methods for Non-smooth Non-Convex Finite-Sum Coupled Compositional Optimization

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Last-iterate convergence analysis of stochastic momentum methods for neural networks

Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions

A Unified Analysis of Stochastic Momentum Methods for Deep Learning

Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization